Search CORE

18 research outputs found

Lyndon Array Construction during Burrows-Wheeler Inversion

Author: Louza Felipe A.
Manzini Giovanni
Smyth W. F.
Telles Guilherme P.
Publication venue: 'Elsevier BV'
Publication date: 27/10/2017
Field of study

In this paper we present an algorithm to compute the Lyndon array of a string

T

of length

n

as a byproduct of the inversion of the Burrows-Wheeler transform of

T

. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses

2n+o(n)

bits of space and supports constant time access. This representation can be built in linear time using

O(n)

words of space, or in

O(n\log n/\log\log n)

time using asymptotically the same space as

T

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Space efficient merging of de Bruijn graphs and Wheeler graphs

Author: Egidi Lavinia
Louza Felipe A.
Manzini Giovanni
Publication venue
Publication date: 12/07/2021
Field of study

The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of the art algorithm for the same problem but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds. In the second part of the paper we consider the more general problem of merging succinct representations of Wheeler graphs, a recently introduced graph family which includes as special cases de Bruijn graphs and many other known succinct indexes based on the BWT or one of its variants. We show that Wheeler graphs merging is in general a much more difficult problem, and we provide a space efficient algorithm for the slightly simplified problem of determining whether the union graph has an ordering that satisfies the Wheeler conditions.Comment: 24 pages, 10 figures. arXiv admin note: text overlap with arXiv:1902.0288

arXiv.org e-Print Archive

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Lossy Compressor preserving variant calling through Extended BWT

Author: Guerrini Veronica
Louza Felipe A.
Rosone Giovanna
Publication venue: 'Scitepress'
Publication date: 17/04/2023
Field of study

A standard format used for storing the output of high-throughput sequencing experiments is the FASTQ format. It comprises three main components: (i) headers, (ii) bases (nucleotide sequences), and (iii) quality scores. FASTQ files are widely used for variant calling, where sequencing data are mapped into a reference genome to discover variants that may be used for further analysis. There are many specialized compressors that exploit redundancy in FASTQ data with the focus only on either the bases or the quality scores components. In this paper we consider the novel problem of lossy compressing, in a reference-free way, FASTQ data by modifying both components at the same time, while preserving the important information of the original FASTQ. We introduce a general strategy, based on the Extended Burrows-Wheeler Transform (EBWT) and positional clustering, and we present implementations in both internal memory and external memory. Experimental results show that the lossy compression performed by our tool is able to achieve good compression while preserving information relating to variant calling more than the competitors. Availability: the software is freely available at https://github.com/veronicaguerrini/BFQzip.Comment: Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologie

arXiv.org e-Print Archive

A Grammar Compression Algorithm based on Induced Suffix Sorting

Author: Ayala-Rincón Mauricio
Gog Simon
Louza Felipe A.
Navarro Gonzalo
Nunes Daniel Saad Nogueira
Publication venue
Publication date: 08/11/2017
Field of study

We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, introduced by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its correspondent factor. The resulting grammar is encoded by exploring some redundancies, such as common prefixes between suffix rules, which are sorted according to SAIS framework. When compared to well-known compression tools such as Re-Pair and 7-zip, our algorithm is competitive and very effective at handling repetitive string regarding compression ratio, compression and decompression running time

arXiv.org e-Print Archive

Crossref

Repositorio Académico de la Universidad de Chile

External memory BWT and LCP computation for sequence collections with applications

Author: Egidi Lavinia
Louza Felipe A.
Manzini Giovanni
Telles Guilherme P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences. In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs

arXiv.org e-Print Archive

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Analysis of structural brain asymmetries in attention-deficit/hyperactivity disorder in 39 datasets

Author: Ambrosino Sara
Asherson Philip
Banaschewski Tobias
Bandeira Cibele E
Baranov Alexandr
Bau Claiton H D
Baumeister Sarah
Baur-Streubel Ramona
Bellgrove Mark A
Biederman Joseph
Bralten Janita
Brandeis Daniel
Brem Silvia
Buitelaar Jan K
Busatto Geraldo F
Castellanos Francisco X
Cercignani Mara
Chaim-Avancini Tiffany M
Chantiluke Kaylita C
Christakou Anastasia
Coghill David
Conzelmann Annette
Cubillo Ana I
Cupertino Renata B
de Zeeuw Patrick
Doyle Alysa E
Durston Sarah
Earl Eric A
Epstein Jeffery N
Ethofer Thomas
Fair Damien A
Fallgatter Andreas J
Faraone Stephen V
Fisher Simon E.
Francks Clyde
Franke Barbara
Frodl Thomas
Gabel Matt C
Glahn David C.
Gogberashvili Tinatin
Grevet Eugenio H
Haavik Jan
Harrison Neil A
Hartman Catharina A
Heslenfeld Dirk J
Hoekstra Pieter J
Hohmann Sarah
Hoogman Martine
Høvik Marie F
Jahanshad Neda
Jernigan Terry L
Kardatzki Bernd
Karkashadze Georgii
Kelly Clare
Kohls Gregor
Konrad Kerstin
Kuntsi Jonna
Lazaro Luisa
Lera-Miguel Sara
Lesch Klaus-Peter
Louza Mario R
Lundervold Astri J
Malpas Charles B
Mattos Paulo
McCarthy Hazel
Medland Sarah E.
Namazova-Baranova Leyla
Nicolau Rosa
Nigg Joel T
Novotny Stephanie E
O'Gorman Tuura Ruth L
Oberwelland Weiss Eileen
Oosterlaan Jaap
Oranje Bob
Paloyelis Yannis
Pauli Paul
Picon Felipe A.
Plessen Kerstin J
Postema Merel C
Ramos-Quiroga J Antoni
Reif Andreas
Reneman Liesbeth
Rosa Pedro G P
Rubia Katya
Schrantee Anouk
Schweren Lizanne J S
Seitz Jochen
Shaw Philip
Silk Tim J
Skokauskas Norbert
Soliva Vila Juan Carlos
Stevens Michael C
Sudre Gustavo
Tamm Leanne
Thompson Paul M
Tovar-Moll Fernanda
van Erp Theo G M
Vance Alasdair
Vilarroya Oscar
Vives-Gilabert Yolanda
von Polier Georg G
Walitza Susanne
Yoncheva Yuliya N
Zanetti Marcus V
Ziegler Georg C
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

Objective Some studies have suggested alterations of structural brain asymmetry in attention-deficit/hyperactivity disorder (ADHD), but findings have been contradictory and based on small samples. Here, we performed the largest ever analysis of brain left-right asymmetry in ADHD, using 39 datasets of the ENIGMA consortium. Methods We analyzed asymmetry of subcortical and cerebral cortical structures in up to 1,933 people with ADHD and 1,829 unaffected controls. Asymmetry Indexes (AIs) were calculated per participant for each bilaterally paired measure, and linear mixed effects modeling was applied separately in children, adolescents, adults, and the total sample, to test exhaustively for potential associations of ADHD with structural brain asymmetries. Results There was no evidence for altered caudate nucleus asymmetry in ADHD, in contrast to prior literature. In children, there was less rightward asymmetry of the total hemispheric surface area compared to controls (t = 2.1, p = .04). Lower rightward asymmetry of medial orbitofrontal cortex surface area in ADHD (t = 2.7, p = .01) was similar to a recent finding for autism spectrum disorder. There were also some differences in cortical thickness asymmetry across age groups. In adults with ADHD, globus pallidus asymmetry was altered compared to those without ADHD. However, all effects were small (Cohen’s d from −0.18 to 0.18) and would not survive study-wide correction for multiple testing. Conclusion Prior studies of altered structural brain asymmetry in ADHD were likely underpowered to detect the small effects reported here. Altered structural asymmetry is unlikely to provide a useful biomarker for ADHD, but may provide neurobiological insights into the trait

Optimal Suffix Sorting And Lcp Array Construction For Constant Alphabets

Author: Felipe A.
Guilherme P.
Louza
Simon
Publication venue: 'Elsevier BV'
Publication date: 13/11/2017
Field of study

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)We show how the longest common prefix (LCP) array can be generated as a by-product of the suffix array construction algorithm SACA-K (Nong, 2013). Our algorithm builds on Fischer's proposal (Fischer, WADS'11), and also runs in linear time, but uses only constant extra memory for constant alphabets. (C) 2016 Elsevier B.V. All rights reserved.1183034CAPESCNPq [162338/2015-5]Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

Repositorio da Producao Cientifica e Intelectual da Unicamp

Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs

Author: Felipe A. Louza
Giovanni Manzini
Lavinia Egidi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of the art algorithm for the same problem but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds. In the second part of the paper we consider the more general problem of merging succinct representations of Wheeler graphs, a recently introduced graph family which includes as special cases de Bruijn graphs and many other known succinct indexes based on the BWT or one of its variants. In this paper we provide a space efficient algorithm for Wheeler graph merging; our algorithm works under the assumption that the union of the input Wheeler graphs has an ordering that satisfies the Wheeler conditions and which is compatible with the ordering of the original graphs

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Algorithms to compute the Burrows-Wheeler similarity distribution

Author: Gog Simon
Louza Felipe A.
Telles Guilherme P.
Zhao Liang
Publication venue: 'Elsevier BV'
Publication date: 30/04/2020
Field of study

The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more their symbols are intermixed in the transformation, the more the strings are similar. In this article we present two new algorithms to compute similarity measures based on the BWT for string collections. In particular, we present practical and theoretical improvements to the computation of the Burrows-Wheeler Similarity Distribution for all pairs of strings in a collection. Our algorithms take advantage of the BWT computed for the concatenation of all strings, and use compressed data structures that allow reducing the running time with a small memory footprint, as shown by a set of experiments with real and artificial datasets782145156CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO - CNPQFUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESP303012/2015-3; 425340/2016-3; 310685/2015-02017/09105-0; 2018/21509-2; 2015/50122-

Repositorio da Producao Cientifica e Intelectual da Unicamp